Modern autonomous driving system is characterized as modular tasks in sequential order, i.e., perception, prediction and planning. As sensors and hardware get improved, there is trending popularity to devise a system that can perform a wide diversity of tasks to fulfill higher-level intelligence. Contemporary approaches resort to either deploying standalone models for individual tasks, or designing a multi-task paradigm with separate heads. These might suffer from accumulative error or negative transfer effect. Instead, we argue that a favorable algorithm framework should be devised and optimized in pursuit of the ultimate goal, i.e. planning of the self-driving-car. Oriented at this goal, we revisit the key components within perception and prediction. We analyze each module and prioritize the tasks hierarchically, such that all these tasks contribute to planning (the goal). To this end, we introduce Unified Autonomous Driving (UniAD), the first comprehensive framework up-to-date that incorporates full-stack driving tasks in one network. It is exquisitely devised to leverage advantages of each module, and provide complementary feature abstractions for agent interaction from a global perspective. Tasks are communicated with unified query design to facilitate each other toward planning. We instantiate UniAD on the challenging nuScenes benchmark. With extensive ablations, the effectiveness of using such a philosophy is proven to surpass previous state-of-the-arts by a large margin in all aspects. The full suite of codebase and models would be available to facilitate future research in the community.
translated by 谷歌翻译
Understanding objects is a central building block of artificial intelligence, especially for embodied AI. Even though object recognition excels with deep learning, current machines still struggle to learn higher-level knowledge, e.g., what attributes an object has, and what can we do with an object. In this work, we propose a challenging Object Concept Learning (OCL) task to push the envelope of object understanding. It requires machines to reason out object affordances and simultaneously give the reason: what attributes make an object possesses these affordances. To support OCL, we build a densely annotated knowledge base including extensive labels for three levels of object concept (category, attribute, affordance), and the causal relations of three levels. By analyzing the causal structure of OCL, we present a baseline, Object Concept Reasoning Network (OCRN). It leverages causal intervention and concept instantiation to infer the three levels following their causal relations. In experiments, OCRN effectively infers the object knowledge while following the causalities well. Our data and code are available at https://mvig-rhos.com/ocl.
translated by 谷歌翻译
Unsupervised domain adaptation (UDA) aims to learn a model trained on source domain and performs well on unlabeled target domain. In medical image segmentation field, most existing UDA methods depend on adversarial learning to address the domain gap between different image modalities, which is ineffective due to its complicated training process. In this paper, we propose a simple yet effective UDA method based on frequency and spatial domain transfer uner multi-teacher distillation framework. In the frequency domain, we first introduce non-subsampled contourlet transform for identifying domain-invariant and domain-variant frequency components (DIFs and DVFs), and then keep the DIFs unchanged while replacing the DVFs of the source domain images with that of the target domain images to narrow the domain gap. In the spatial domain, we propose a batch momentum update-based histogram matching strategy to reduce the domain-variant image style bias. Experiments on two cross-modality medical image segmentation datasets (cardiac, abdominal) show that our proposed method achieves superior performance compared to state-of-the-art methods.
translated by 谷歌翻译
Modern supervised learning neural network models require a large amount of manually labeled data, which makes the construction of domain-specific knowledge graphs time-consuming and labor-intensive. In parallel, although there has been much research on named entity recognition and relation extraction based on distantly supervised learning, constructing a domain-specific knowledge graph from large collections of textual data without manual annotations is still an urgent problem to be solved. In response, we propose an integrated framework for adapting and re-learning knowledge graphs from one coarse domain (biomedical) to a finer-define domain (oncology). In this framework, we apply distant-supervision on cross-domain knowledge graph adaptation. Consequently, no manual data annotation is required to train the model. We introduce a novel iterative training strategy to facilitate the discovery of domain-specific named entities and triples. Experimental results indicate that the proposed framework can perform domain adaptation and construction of knowledge graph efficiently.
translated by 谷歌翻译
Causal mediation analysis can unpack the black box of causality and is therefore a powerful tool for disentangling causal pathways in biomedical and social sciences, and also for evaluating machine learning fairness. To reduce bias for estimating Natural Direct and Indirect Effects in mediation analysis, we propose a new method called DeepMed that uses deep neural networks (DNNs) to cross-fit the infinite-dimensional nuisance functions in the efficient influence functions. We obtain novel theoretical results that our DeepMed method (1) can achieve semiparametric efficiency bound without imposing sparsity constraints on the DNN architecture and (2) can adapt to certain low dimensional structures of the nuisance functions, significantly advancing the existing literature on DNN-based semiparametric causal inference. Extensive synthetic experiments are conducted to support our findings and also expose the gap between theory and practice. As a proof of concept, we apply DeepMed to analyze two real datasets on machine learning fairness and reach conclusions consistent with previous findings.
translated by 谷歌翻译
在拒绝的环境中进行搜索对于群体机器人来说是具有挑战性的,因为不允许GNSS,映射,数据共享和中央处理的帮助。但是,使用嗅觉和听觉像动物一样合作可能是改善群体合作的重要方法。在本文中,提出了一群自主机器人来探索拒绝环境的嗅觉审计算法算法(OA-BUG)。构建了一个模拟环境,以衡量OA-BUG的性能。使用OA-BUG的搜索任务覆盖范围可以达到96.93%,与类似的算法SGBA相比,最大的40.55%提高了40.55%。此外,在实际的群机器人上进行了实验,以证明OA-BUG的有效性。结果表明,OA-BUG可以在被拒绝的环境中改善群体机器人的性能。
translated by 谷歌翻译
DeepMind的游戏理论与多代理团队研究多学科学习的几个方面,从计算近似值到游戏理论中的基本概念,再到在富裕的空间环境中模拟社会困境,并在困难的团队协调任务中培训3-D类人动物。我们小组的一个签名目的是使用DeepMind在DeepMind中提供的资源和专业知识,以深入强化学习来探索复杂环境中的多代理系统,并使用这些基准来提高我们的理解。在这里,我们总结了我们团队的最新工作,并提出了一种分类法,我们认为这重点介绍了多代理研究中许多重要的开放挑战。
translated by 谷歌翻译
深度学习模型在各种时间序列预测任务中显示出了令人印象深刻的结果,在这些任务中,对过去的未来有条件分布进行建模是本质。但是,当这种条件分布是非平稳的时候,这些模型始终学习并准确预测的挑战。在这项工作中,我们提出了一种新方法,通过清楚地将固定的条件分布模型从非平稳动力学建模中清晰地取消固定的条件分布建模,以对非平稳条件分布进行建模。我们的方法基于贝叶斯动态模型,该模型可以适应条件分布的变化和深层条件分布模型,该模型可以使用分解的输出空间处理大型多元时间序列。我们对合成和流行的公共数据集的实验结果表明,我们的模型可以比最先进的深度学习解决方案更好地适应非平稳时间序列。
translated by 谷歌翻译
在本报告中,我们在CVPR 2022的Waymo Open数据集挑战中介绍了解决方案和流程预测挑战,该挑战在排行榜上排名第一。我们已经开发了一个新型的层次空间时间网络,该网络具有时空编码器,一个富含潜在变量的多尺度聚合器以及一个递归层次结构3D解码器。我们使用多种损失,包括局灶性损失和修改的流量损失来有效指导训练过程。我们的方法达到了一个占地0.8389的流动占用AUC,并且优于排行榜上所有其他团队。
translated by 谷歌翻译
腮腺肿瘤约占头颈肿瘤的2%至10%。术前肿瘤定位,鉴别诊断以及随后选择适当的腮腺肿瘤治疗方法。然而,这些肿瘤的相对稀有性和高度分散的组织类型使基于术前放射线学对这种肿瘤病变的细微差异诊断造成了未满足的需求。最近,深度学习方法发展迅速,尤其是变形金刚在计算机视觉中击败了传统的卷积神经网络。为计算机视觉任务提出了许多新的基于变压器的网络。在这项研究中,收集了多中心多模束MRI图像。使用了基于变压器的SWIN-UNET。将搅拌,T1和T2模态的MRI图像合并为三通道数据以训练网络。我们实现了对腮腺和肿瘤感兴趣区域的分割。测试集上的模型DSC为88.63%,MPA为99.31%,MIOU为83.99%,HD为3.04。然后在本文中设计了一系列比较实验,以进一步验证算法的分割性能。
translated by 谷歌翻译